1 Abstract

We processed the exomeSeq VCF files associated with the JHU-Biobank data to produce mutation annotation files (MAF) using Ensembl’s VEP (Variant Effect Predictor) tool. MAF data from these files are stored in (syn20546180).

1.0.1 Sample information

  • Variant Data: 2_009 had vcf files for Blood and Neurofibroma samples only. Refer to David Mohr’s email: 2_009_MPNST_TD data was not released since it did not pass QC
  • ExomeSeq Files: 2_009 had bam files for Blood, NF, and MPNST samples

So the first half of this document shows the variant information associated with Blood and NF samples. The second half of the document shows copy ratio analysis at the chromosomal level for Blood, NF, and MPNST samples.

  • Genotype of Sample 2_009 => +/- (Het)

2 Variant Analysis

2.0.1 Variants in genes of interest:

The oncoplot below shows the types of variants found in genes of interest listed by the Pratilas lab. The Variant Classification is shown as a legend below the plot.

## [1] "Mutations in our genes of interest"

2.0.2 Allele frequency

The allele frequency of the specific variants according to gnomAD can be found below:

2.0.3 Putative variant location :

The series of lollipopPlots below show the putative location and amino-acid information associated with the variants in the above genes of interest.

The top lollipop refers to the variant in the normal Blood sample, the bottom one refers to the one in the NF sample. The gene name and the selected transcript ID (beginning with “NM_”) is located in the right hand top corner of each plot. In case more than one transcripts are found for a gene, the longest transcript is used for the visualization (the one selected is highlighted in the right hand top corner).

A small caveat in these plots is that when a protein has two overlapping domains, the labels overlap as well. In the interest of readability, the font size was reduced a bit, but some overlaps were unavoidable. Currently exploring other visualization tools to tackle this caveat.

## Gene: NF1
##    HGNC    refseq.ID   protein.ID aa.length
## 1:  NF1 NM_001042492 NP_001035957      2839
## 2:  NF1    NM_000267    NP_000258      2818
##    HGNC    refseq.ID   protein.ID aa.length
## 1:  NF1 NM_001042492 NP_001035957      2839
## 2:  NF1    NM_000267    NP_000258      2818

## Gene: TP53
##    HGNC    refseq.ID   protein.ID aa.length
## 1: TP53    NM_000546    NP_000537       393
## 2: TP53 NM_001126112 NP_001119584       393
## 3: TP53 NM_001126118 NP_001119590       354
## 4: TP53 NM_001126115 NP_001119587       261
## 5: TP53 NM_001126113 NP_001119585       346
## 6: TP53 NM_001126117 NP_001119589       214
## 7: TP53 NM_001126114 NP_001119586       341
## 8: TP53 NM_001126116 NP_001119588       209
##    HGNC    refseq.ID   protein.ID aa.length
## 1: TP53    NM_000546    NP_000537       393
## 2: TP53 NM_001126112 NP_001119584       393
## 3: TP53 NM_001126118 NP_001119590       354
## 4: TP53 NM_001126115 NP_001119587       261
## 5: TP53 NM_001126113 NP_001119585       346
## 6: TP53 NM_001126117 NP_001119589       214
## 7: TP53 NM_001126114 NP_001119586       341
## 8: TP53 NM_001126116 NP_001119588       209

## Gene: EZH2
##    HGNC    refseq.ID   protein.ID aa.length
## 1: EZH2 NM_001203249 NP_001190178       695
## 2: EZH2 NM_001203248 NP_001190177       737
## 3: EZH2    NM_152998    NP_694543       707
## 4: EZH2 NM_001203247 NP_001190176       746
## 5: EZH2    NM_004456    NP_004447       751
##    HGNC    refseq.ID   protein.ID aa.length
## 1: EZH2 NM_001203249 NP_001190178       695
## 2: EZH2 NM_001203248 NP_001190177       737
## 3: EZH2    NM_152998    NP_694543       707
## 4: EZH2 NM_001203247 NP_001190176       746
## 5: EZH2    NM_004456    NP_004447       751

## Gene: EGFR
##    HGNC refseq.ID protein.ID aa.length
## 1: EGFR NM_005228  NP_005219      1210
## 2: EGFR NM_201284  NP_958441       705
## 3: EGFR NM_201282  NP_958439       628
## 4: EGFR NM_201283  NP_958440       405
##    HGNC refseq.ID protein.ID aa.length
## 1: EGFR NM_005228  NP_005219      1210
## 2: EGFR NM_201284  NP_958441       705
## 3: EGFR NM_201282  NP_958439       628
## 4: EGFR NM_201283  NP_958440       405

## Gene: PDGFRA

## Gene: CCND3
##     HGNC    refseq.ID   protein.ID aa.length
## 1: CCND3 NM_001136125 NP_001129597       220
## 2: CCND3    NM_001760    NP_001751       292
## 3: CCND3 NM_001136017 NP_001129489       211
## 4: CCND3 NM_001136126 NP_001129598        96
##     HGNC    refseq.ID   protein.ID aa.length
## 1: CCND3 NM_001136125 NP_001129597       220
## 2: CCND3    NM_001760    NP_001751       292
## 3: CCND3 NM_001136017 NP_001129489       211
## 4: CCND3 NM_001136126 NP_001129598        96

## Gene: KDR

## Gene: FLT4
##    HGNC refseq.ID protein.ID aa.length
## 1: FLT4 NM_182925  NP_891555      1363
## 2: FLT4 NM_002020  NP_002011      1298
##    HGNC refseq.ID protein.ID aa.length
## 1: FLT4 NM_182925  NP_891555      1363
## 2: FLT4 NM_002020  NP_002011      1298

## Gene: FGFR4
##     HGNC refseq.ID protein.ID aa.length
## 1: FGFR4 NM_002011  NP_002002       802
## 2: FGFR4 NM_213647  NP_998812       802
## 3: FGFR4 NM_022963  NP_075252       762
##     HGNC refseq.ID protein.ID aa.length
## 1: FGFR4 NM_002011  NP_002002       802
## 2: FGFR4 NM_213647  NP_998812       802
## 3: FGFR4 NM_022963  NP_075252       762

## Gene: AXL
##    HGNC refseq.ID protein.ID aa.length
## 1:  AXL NM_021913  NP_068713       894
## 2:  AXL NM_001699  NP_001690       885
##    HGNC refseq.ID protein.ID aa.length
## 1:  AXL NM_021913  NP_068713       894
## 2:  AXL NM_001699  NP_001690       885

## Gene: AURKA
##     HGNC refseq.ID protein.ID aa.length
## 1: AURKA NM_003600  NP_003591       403
## 2: AURKA NM_198433  NP_940835       403
## 3: AURKA NM_198434  NP_940836       403
## 4: AURKA NM_198435  NP_940837       403
## 5: AURKA NM_198436  NP_940838       403
## 6: AURKA NM_198437  NP_940839       403
##     HGNC refseq.ID protein.ID aa.length
## 1: AURKA NM_003600  NP_003591       403
## 2: AURKA NM_198433  NP_940835       403
## 3: AURKA NM_198434  NP_940836       403
## 4: AURKA NM_198435  NP_940837       403
## 5: AURKA NM_198436  NP_940838       403
## 6: AURKA NM_198437  NP_940839       403

## Gene: APC
##    HGNC    refseq.ID   protein.ID aa.length
## 1:  APC NM_001127511 NP_001120983      2825
## 2:  APC NM_001127510 NP_001120982      2843
## 3:  APC    NM_000038    NP_000029      2843
##    HGNC    refseq.ID   protein.ID aa.length
## 1:  APC NM_001127511 NP_001120983      2825
## 2:  APC NM_001127510 NP_001120982      2843
## 3:  APC    NM_000038    NP_000029      2843

## Gene: ATM

## Gene: SMARCA2
##       HGNC refseq.ID protein.ID aa.length
## 1: SMARCA2 NM_003070  NP_003061      1590
## 2: SMARCA2 NM_139045  NP_620614      1572
##       HGNC refseq.ID protein.ID aa.length
## 1: SMARCA2 NM_003070  NP_003061      1590
## 2: SMARCA2 NM_139045  NP_620614      1572

3 Copy Ratio Analysis

Currently our copy ratio analysis captures information at the gross chromosomal level and not gene level. From the above plot few observations emerge:

  • The MPNST sample data is extremely noisy. Refering back to David Mohr’s email, they did not release VCF data for this sample since it did not pass their QC. This plot echoes a similar observation where the bam file that was released (probably by mistake) captures extremely noisy data. Great caution should be taken while making any conclusions regarding this sample from the above plot.

  • The plots for Blood and Neurofibroma samples look fairly similar with the exception of possible changes in copy ratio in Chr 9, 10, and 11.